Information Theoretical Clustering via Semidefinite Programming
نویسندگان
چکیده
We propose techniques of convex optimization for information theoretical clustering. The clustering objective is to maximize the mutual information between data points and cluster assignments. We formulate this problem first as an instance of max k cut on weighted graphs. We then apply the technique of semidefinite programming (SDP) relaxation to obtain a convex SDP problem. We show how the solution of the SDP problem can be further improved with a lowrank refinement heuristic. The low-rank solution reveals more clearly the cluster structure of the data. Empirical studies on several datasets demonstrate the effectiveness of our approach. In particular, the approach outperforms several other clustering algorithms when compared on standard evaluation metrics.
منابع مشابه
Ensemble Clustering using Semidefinite Programming
We consider the ensemble clustering problem where the task is to 'aggregate' multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D...
متن کاملA family of norms with applications in quantum information theory II
We consider the problem of computing the family of operator norms recently introduced in [1]. We develop a family of semidefinite programs that can be used to exactly compute them in small dimensions and bound them in general. Some theoretical consequences follow from the duality theory of semidefinite programming, including a new constructive proof that for all r there are non-positive partial...
متن کاملAdvanced Optimization Laboratory Title: Approximating K-means-type clustering via semidefinite programming
One of the fundamental clustering problems is to assign n points into k clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). We show that our 0-1 SDP model provides an unified framework for several clustering approaches such as normalized k-cut and spectr...
متن کاملGuaranteed clustering and biclustering via semidefinite programming
Identifying clusters of similar objects in data plays a significant role in a wide range of applications. As a model problem for clustering, we consider the densest k-disjoint-clique problem, whose goal is to identify the collection of k disjoint cliques of a given weighted complete graph maximizing the sum of the densities of the complete subgraphs induced by these cliques. In this paper, we e...
متن کاملApproximating K-means-type Clustering via Semidefinite Programming
One of the fundamental clustering problems is to assign n points into k clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). We show that our 0-1 SDP model provides an unified framework for several clustering approaches such as normalized k-cut and spectr...
متن کامل